Objective: to work with machine learning algorithms in R for spatio-temporal prediction of variables.
Machine learning algorithms have been widely used:
Machine learning has proven to be useful in many fields. What are the steps to apply machine learning methods?
There is not a best performing machine learning algorithm. Their performance depend on the type of data and the final objectives. Basically, the two main groups of machine learning algorithms are:
The following algorithms are often used for supervised learning:
The following algorithms are often used for unsupervised learning:
In this course, we will focus mainly on supervised learning algorithms as we want to work with regression and classification of variables. There are several R packages related to machine learning:
Random Forest (RF; Biau and Scornet, 2016, Breiman, 2001, Prasad et al., 2006) is an ensemble learning method that can be used for supervised classification and regression tasks by constructing numerous decision trees using the relationship between independent and dependent variables.
\[ \hat{\theta}^B(x) = \frac{1}{B} \sum_{b=1}^B{t^*_b(x)} \] \(\hat{\theta}^B(x)\) is the final prediction
\(b\) is the individual bootstrap sample
\(B\) is the total number of trees
\(t^*_b\) is the individual decision tree
Support Vector Machines (SVM; Hearst et al., 1998, Steinwart et al., 2008) aims to find a hyperplane in an N-dimensional space that distinctly classifies the given data points. As there are many hyperplanes that can be selected, SVM maximises the margin distance between data points of different classes.
Neural Networks (NN; ) also known as Artificial Neural Networks are composed of node layers. They have an input layer, one or more hidden layers, and an output layer. Each node connects to another and has an associated weight and threshold (IBM). If the output of a designed node its above the specified threshold, the node sends information to the next one.
Now that we got an overview of different machine learning algorithms, let’s use them for a regression exercise. The mtcars dataset (included in R) comprises fuel consumption and 10 aspects of automobile design and performance for 32 cars.
print(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2We want to predict the horsepower (hp) of some cars using the other parameters of the cars:
In this exercise we will use Random Forest, Support Vector Machines, and Neural Networks. Therefore, let’s load the correspondent packages:
To predict the hp of the cars, let’s separate randomly the data in a training sample (to train the machine learning algorithms) and a validation sample (to evaluate their performance). For reproducibility, we will set a seed in the code. Setting a seed in R means to initialize a pseudorandom number generator.
For separating the data, lets store in an object the total number of cars.
After this, we can generate a random sample of positions. Let’s use the 80% of the total number of cars for training purposes.
Now, we can separate the mtcars dataset into a training and validation sets.
training <- mtcars[pos_training,]
validation <- mtcars[-pos_training,]
print(training)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
print(validation)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1The randomForest package implements Breiman’s random forest algorithm for classification and regression.
x: a data frame or a matrix of predictors, or a formula describing the model to be fitted
data: a data frame containing the variables in the model
ntree: Number of trees to grow
mtry: Number of variables randomly sampled as candidates at each split
nodesize: Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time)
maxnodes: Maximum number of terminal nodes trees in the forest can have. If not given, trees are grown to the maximum possible (subject to limits by nodesize)
na.action: A function to specify the action to be taken if NAs are found
There are more parameters related to this function. Please type ?randomForest to see them.
In this example, we will use the default values of the function. The formula that will be applied is \(hp\sim.\). This means that \(hp\) is the dependent variable and the \(.\) means that all the variables from the training dataset will be used.
In the case that we want to use only particular parameters we could set them as follows:
We can print the rf_model object.
To predict the hp values from the validation set, we will use the predict function, which requires the model and the covariates to predict the horsepower of the cars. To make this explicit, we will exclude the column of hp from the data.frame.
We can create a data.frame with the real values (Obs) and the Random Forest predictions (RF).
The function varImpPlot will generate a dotchart of variable importance as measured by a Random Forest.
The e1071 is used to train a support vector machine for classification and regression purposes.
formula: a data frame or a matrix of predictors, or a formula describing the model to be fitted
data: a data frame containing the variables in the model
kernel: the kernel used in training and predicting: linear, polynomial, radial, and sigmoid
degree: parameter needed for kernel of type polynomial
gamma: parameter needed for all kernels except linear
coef0: parameter needed for kernels of type polynomial and sigmoid
na.action: A function to specify the action to be taken if NAs are found
There are more parameters related to this function. Please type ?svm to see them.
Similarly to RF, the formula that will be applied is \(hp\sim.\). Let’s use the different kernels.
svm_model_lin <- svm(hp ~ ., data = training, kernel = "linear")
svm_model_poly <- svm(hp ~ ., data = training, kernel = "polynomial")
svm_model_rad <- svm(hp ~ ., data = training, kernel = "radial")
svm_model_sig <- svm(hp ~ ., data = training, kernel = "sigmoid")Now we can predict the hp values:
Finally, we can summarise the results.
data.frame(Obs = validation[,4], SVM_lin = svm_prediction_lin,
SVM_poly = svm_prediction_poly, SVM_rad = svm_prediction_rad,
SVM_sig = svm_prediction_sig)
## Obs SVM_lin SVM_poly SVM_rad SVM_sig
## Hornet 4 Drive 110 121.51574 123.47778 106.47474 120.99467
## Merc 280 123 156.98698 146.04602 125.07476 142.84838
## Merc 450SL 180 156.65475 168.71280 171.96696 160.63893
## Cadillac Fleetwood 205 236.92939 223.57702 211.27667 226.52371
## Toyota Corona 97 45.72309 80.75757 91.49229 82.29314
## Fiat X1-9 66 69.46686 77.96054 77.80608 68.92851The neuralnet package allows to train neural networks using backpropagation, resilient backpropagation (RPROP) with or without weight backtracking.
formula: a data frame or a matrix of predictors, or a formula describing the model to be fitted
data: a data frame containing the variables in the model
hidden: a vector of integers specifying the number of hidden neurons (vertices) in each layer
threshold: a numeric value specifying the threshold for the partial derivatives of the error function as stopping criteria
stepmax: the maximum steps for the training of the neural network
algorithm: a string containing the algorithm type to calculate the neural network. The following types are possible: ‘backprop’, ‘rprop+’, ‘rprop-’, ‘sag’, or ‘slr’
There are more parameters related to this function. Please type ?neuralnet to see them.
The formula that will be applied is \(hp\sim.\).
Now we can predict the hp values:
Finally, we can summarise the results.
data.frame(Obs=validation[,4], NN=nn_prediction)
## Obs NN
## Hornet 4 Drive 110 150.4998
## Merc 280 123 150.4998
## Merc 450SL 180 150.4998
## Cadillac Fleetwood 205 150.4998
## Toyota Corona 97 150.4998
## Fiat X1-9 66 150.4998Let’s normalise the data.
norm_df <-rbind(training, validation)
#define Min-Max normalization function
minmax_norm <- function(x) {
(x - min(x)) / (max(x) - min(x))
}
norm <- apply(norm_df, 2, minmax_norm)
head(norm)
## mpg cyl disp hp drat wt
## Chrysler Imperial 0.1829787 1.0 0.9201796 0.6289753 0.2165899 0.9798006
## Volvo 142E 0.4680851 0.0 0.1244699 0.2014134 0.6221198 0.3239581
## Merc 230 0.5276596 0.0 0.1738588 0.1519435 0.5345622 0.4185630
## Maserati Bora 0.1957447 1.0 0.5734597 1.0000000 0.3594470 0.5259524
## Mazda RX4 Wag 0.4510638 0.5 0.2217511 0.2049470 0.5253456 0.3482485
## Merc 450SLC 0.2042553 1.0 0.5106011 0.4522968 0.1428571 0.5796471
## qsec vs am gear carb
## Chrysler Imperial 0.34761905 0 0 0.0 0.4285714
## Volvo 142E 0.48809524 1 1 0.5 0.1428571
## Merc 230 1.00000000 1 0 0.5 0.1428571
## Maserati Bora 0.01190476 0 1 1.0 1.0000000
## Mazda RX4 Wag 0.30000000 0 1 0.5 0.4285714
## Merc 450SLC 0.41666667 0 0 0.0 0.2857143Now, we can subset the data accordingly.
Finally, we can apply the NN model.
We can plot the NN.
We can plot the NN in a nicer way.
We can use the models to predict the variable.
nn_prediction_log <- predict(nn_model_log, norm_validation[,-4])
nn_prediction_tan <- predict(nn_model_tan, norm_validation[,-4])To visualise the results:
data.frame(Obs = norm_validation[,4], NN_log = nn_prediction_log, NN_tan = nn_prediction_tan)
## Obs NN_log NN_tan
## Hornet 4 Drive 0.20494700 0.15848709 0.280470546
## Merc 280 0.25088339 0.30682250 0.338217726
## Merc 450SL 0.45229682 0.35768535 0.359986470
## Cadillac Fleetwood 0.54063604 0.61909455 0.705239030
## Toyota Corona 0.15901060 0.10892604 -0.003904399
## Fiat X1-9 0.04946996 0.08748578 0.041047591Finally we can backtransform the hp data:
hp_val_log <- nn_prediction_log * (max(norm_df$hp) - min(norm_df$hp)) + min(norm_df$hp)
hp_val_tan <- nn_prediction_tan * (max(norm_df$hp) - min(norm_df$hp)) + min(norm_df$hp)
data.frame(Obs = validation[,4], NN_log = hp_val_log, NN_tan = hp_val_tan)
## Obs NN_log NN_tan
## Hornet 4 Drive 110 96.85185 131.37316
## Merc 280 123 138.83077 147.71562
## Merc 450SL 180 153.22495 153.87617
## Cadillac Fleetwood 205 227.20376 251.58265
## Toyota Corona 97 82.82607 50.89506
## Fiat X1-9 66 76.75848 63.61647Finally let’s create a summary of the results.
result <- data.frame(Obs = validation[,4], RF = rf_prediction, SVM_lin = svm_prediction_lin,
SVM_poly = svm_prediction_poly, SVM_sig = svm_prediction_sig,
NN_log = hp_val_log, NN_tan = hp_val_tan)
print(result)
## Obs RF SVM_lin SVM_poly SVM_sig NN_log
## Hornet 4 Drive 110 123.43118 121.51574 123.47778 120.99467 96.85185
## Merc 280 123 122.20713 156.98698 146.04602 142.84838 138.83077
## Merc 450SL 180 173.63778 156.65475 168.71280 160.63893 153.22495
## Cadillac Fleetwood 205 219.34986 236.92939 223.57702 226.52371 227.20376
## Toyota Corona 97 96.53314 45.72309 80.75757 82.29314 82.82607
## Fiat X1-9 66 74.71494 69.46686 77.96054 68.92851 76.75848
## NN_tan
## Hornet 4 Drive 131.37316
## Merc 280 147.71562
## Merc 450SL 153.87617
## Cadillac Fleetwood 251.58265
## Toyota Corona 50.89506
## Fiat X1-9 63.61647Now let’s do a more practical example. Imagine that we have two nested catchments, and we have some missing data in the lower catchment that we need. Let’s use machine learning to predict those missing values.
To work with the time series, we will use the hydroTSM package. First, let’s read the data, which is stored as a zoo object.
Let’s separate the data of the catchment and subcatchment accordingly.
As in this case, we have the data complete, we will create an additional object with the streamflow of the catchment and we will delete some years of data. In this sense, we will be able to compare the measured values with the predictions.
We can store the dates of the zoo in an object.
dates <- index(q)
head(dates)
## [1] "1990-01-01" "1990-01-02" "1990-01-03" "1990-01-04" "1990-01-05"
## [6] "1990-01-06"
tail(dates)
## [1] "2018-12-26" "2018-12-27" "2018-12-28" "2018-12-29" "2018-12-30"
## [6] "2018-12-31"Now, we will set the first six years of catch_na to NAs. For this purpose, we will create a pos object that will store the position of the data related to ‘1995-12-31’.
In this sense, the object catch will have the complete data.
While the object catch_na will have six years of missing values.
We will partition the data that will be used for training and verification purposes using the pos object that we created earlier.
# The training set should start one day after "1995-12-31"
start <- pos + 1
subcatch_training <- subcatch[, start:length(subcatch)]
catch_training <- catch[, start:length(catch)]
head(subcatch_training)
## 1990-01-01 1990-01-02 1990-01-03 1990-01-04 1990-01-05 1990-01-06
## 4.540620 4.428097 4.269241 4.203052 4.090529 4.004482
head(catch_training)
## 1990-01-01 1990-01-02 1990-01-03 1990-01-04 1990-01-05 1990-01-06
## 2.521102 2.452117 2.392539 2.389403 2.285925 2.220075training <- data.frame(subcatch = subcatch_training, catch = catch_training)
head(training)
## subcatch catch
## 1990-01-01 4.540620 2.521102
## 1990-01-02 4.428097 2.452117
## 1990-01-03 4.269241 2.392539
## 1990-01-04 4.203052 2.389403
## 1990-01-05 4.090529 2.285925
## 1990-01-06 4.004482 2.220075
tail(training)
## subcatch catch
## 2018-12-26 3.263157 1.859470
## 2018-12-27 3.203586 1.803027
## 2018-12-28 3.130777 1.774806
## 2018-12-29 3.177110 1.771670
## 2018-12-30 3.276395 1.909641
## 2018-12-31 3.005016 1.762263
summary(training)
## subcatch catch
## Min. : 0.9664 Min. : 0.4829
## 1st Qu.: 2.7403 1st Qu.: 1.3672
## Median : 4.5208 Median : 2.8190
## Mean : 5.7265 Mean : 4.1321
## 3rd Qu.: 7.2147 3rd Qu.: 5.7070
## Max. :45.2076 Max. :47.0982
## NA's :123 NA's :7During the training, we can face some problems if we have missing data. Let’s exclude missing data from the training.
pos_nas <- which(is.na(training$subcatch))
pos_nas <- c(pos_nas, which(is.na(training$catch)))
print(pos_nas)
## [1] 7396 7397 7398 7399 7400 7401 7402 7403 7404 7405 7406 7407 7408 7409 7410
## [16] 7411 7412 7413 7414 7415 7416 7417 7418 7419 7420 7421 7422 7423 7424 7425
## [31] 7426 7427 7428 7429 7430 7431 7432 7433 7434 7435 7436 7437 7438 7439 7440
## [46] 7441 7442 7443 7444 7445 7446 7447 7448 7449 7450 7451 7452 7453 7454 7455
## [61] 7456 8188 8189 9442 9443 9444 9445 9446 9447 9448 9449 9450 9451 9453 9454
## [76] 9455 9456 9457 9458 9460 9461 9462 9463 9464 9465 9466 9467 9468 9469 9470
## [91] 9471 9472 9473 9474 9475 9476 9477 9478 9479 9480 9481 9482 9483 9484 9485
## [106] 9486 9487 9488 9489 9490 9491 9492 9493 9494 9495 9496 9497 9498 9499 9798
## [121] 9799 9800 9801 2237 2238 2239 2240 5537 6686 9831
training <- training[-pos_nas,]
summary(training)
## subcatch catch
## Min. : 0.9664 Min. : 0.4829
## 1st Qu.: 2.7403 1st Qu.: 1.3609
## Median : 4.5274 Median : 2.8692
## Mean : 5.7287 Mean : 4.1585
## 3rd Qu.: 7.2147 3rd Qu.: 5.7383
## Max. :45.2076 Max. :47.0982The next step is to train the models. For this purpose, we will use Random Forest and Support Vector Machines with the different kernels.
rf_model <- randomForest(catch ~ ., data = training)
svm_model_lin <- svm(catch ~ ., data = training, kernel = "linear")
svm_model_poly <- svm(catch ~ ., data = training, kernel = "polynomial")
svm_model_rad <- svm(catch ~ ., data = training, kernel = "radial")
svm_model_sig <- svm(catch ~ ., data = training, kernel = "sigmoid")Now, let’s prepare the data that will be used to predict the values for the subcatchment (i.e., the daily streamflow of the first six years of the catchment).
covariate <- data.frame(subcatch = subcatch[1:pos])
head(covariate)
## subcatch
## 1990-01-01 4.540620
## 1990-01-02 4.428097
## 1990-01-03 4.269241
## 1990-01-04 4.203052
## 1990-01-05 4.090529
## 1990-01-06 4.004482
tail(covariate)
## subcatch
## 1995-12-26 3.918436
## 1995-12-27 3.819151
## 1995-12-28 3.653676
## 1995-12-29 3.541154
## 1995-12-30 3.441869
## 1995-12-31 3.375679Finally, we can predict the streamflow values using the different models and the covariate object.
Now that we have the predicted values we can convert them to zoo objects.
We can plot the results of the predictions as follows:
plot(catch_na)
lines(rf_prediction, col="green")
lines(svm_prediction_lin, col="blue")
lines(svm_prediction_poly, col="red")
lines(svm_prediction_rad, col="cyan")
lines(svm_prediction_sig, col="purple")
grid()
legend("topright", lwd = 1, c("RF", "SVM-Linear", "SVM-Polynomial", "SVM-Radial", "SVM-Sigmoid"),
col = c("black", "green", "blue", "red", "cyan", "purple"), bty = "n")We can plot the results of the predictions as follows:
Removing the low performing models:
We can plot the results of the predictions as follows:
We can use a performance indicator to assess which model performed the best. The Kling-Gupta efficiency (KGE; Gupta et al., 2009, Kling et al., 2012) has three components: the Pearson correlation coefficient (\(r\)); the bias ratio (\(\beta\)); and the variability ratio (\(\gamma\)).
\[\scriptsize \textrm{KGE'} = 1 - \sqrt{ (r-1)^2 + (\beta-1)^2 + (\gamma-1)^2 } \]
Â
\[\scriptsize r = \frac{ \sum_{i=1}^n{(O_i - \bar{O}) (S_i - \bar{S}) } }{\sqrt{\sum_{i=1}^n{(O_i - \bar{O})^2}}\sqrt{\sum_{i=1}^n{(S_i - \bar{S})^2}}}; \beta = \frac{\mu_{s}}{\mu_{o}}; \gamma = \frac{CV_{s}}{CV_{o}} = \frac{\sigma_{s}/\mu_{s}}{\sigma_{o}/\mu_{o}} \]
\(\mu\) is the mean streamflow (\(Q\)), \(CV\) is the coefficient of variation, \(\sigma\) represents the standard deviation of \(Q\), and the subscripts \(s\) and \(o\) represent simulated and observed \(Q\), respectively. The KGE’ and its components have their optimum value at one, and its optimisation seeks to reproduce the temporal dynamics (measured by \(r\)), while preserving the volume and variability of \(Q\), measured by \(\beta\) and \(\gamma\), respectively.
We can create and object that stores the observations of the verification period.
Finally we can apply the KGE’ by using the KGE function from the hydroGOF package.
KGE(obs = obs, sim = rf_prediction, method = "2012", out.type = "full")
## $KGE.value
## [1] 0.9234464
##
## $KGE.elements
## r Beta Gamma
## 0.9682697 1.0035594 0.9304229
KGE(obs = obs, sim = svm_prediction_lin, method = "2012", out.type = "full")
## $KGE.value
## [1] 0.9080331
##
## $KGE.elements
## r Beta Gamma
## 0.9559886 0.9970294 0.9193025
KGE(obs = obs, sim = svm_prediction_poly, method = "2012", out.type = "full")
## $KGE.value
## [1] 0.08816593
##
## $KGE.elements
## r Beta Gamma
## 0.6845051 0.7331579 1.8128343
KGE(obs = obs, sim = svm_prediction_rad, method = "2012", out.type = "full")
## $KGE.value
## [1] 0.8847093
##
## $KGE.elements
## r Beta Gamma
## 0.9517110 0.9925498 0.8955749
KGE(obs = obs, sim = svm_prediction_sig, method = "2012", out.type = "full")
## $KGE.value
## [1] -90.43503
##
## $KGE.elements
## r Beta Gamma
## -0.7320031 -90.2282000 -4.8975169